AWS Bedrock API Guide
Table of Contents
- Overview
- Prerequisites
- Authentication
- Available Models
- Basic API Usage
- Message Structure and Roles
- System Prompts: Controlling Model Behavior
- Converse API: Unified Interface for All Models
- Guardrails: Content Filtering and Safety
- Inference Parameters by Model
- Understanding Tokens
- Common Inference Parameters Explained
- Error Handling
- Best Practices
- Cost Optimization
- Additional Resources
- Example: Complete Implementation
- Conclusion
Overview
Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. This guide covers how to use the Bedrock APIs and configure inference parameters.
Prerequisites
- AWS Account with Bedrock access
- AWS CLI configured or SDK installed
- Appropriate IAM permissions for Bedrock
- Model access enabled in your AWS region
Authentication
import boto3
# Create a Bedrock client
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
Available Models
Bedrock provides access to multiple foundation models:
- Amazon Titan - Text and embeddings models
- Anthropic Claude - Claude 3 (Opus, Sonnet, Haiku), Claude 2.x
- AI21 Labs Jurassic - Jurassic-2 models
- Cohere - Command and Embed models
- Meta Llama - Llama 2 and Llama 3 models
- Stability AI - Stable Diffusion models
Basic API Usage
Invoke Model (Synchronous)
import json
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
prompt = "What is machine learning?"
# Format request body based on model
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model(
modelId=model_id,
body=json.dumps(request_body)
)
response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])
Invoke Model with Response Stream
Streaming allows you to receive model responses incrementally as they're generated, rather than waiting for the complete response. This is crucial for: - Better user experience - Users see output immediately - Long responses - Start processing before completion - Real-time applications - Chat interfaces, live content generation - Reduced perceived latency - Feels faster even if total time is similar
import json
response = bedrock.invoke_model_with_response_stream(
modelId=model_id,
body=json.dumps(request_body)
)
# Process the stream
stream = response.get('body')
if stream:
for event in stream:
chunk = event.get('chunk')
if chunk:
chunk_data = json.loads(chunk.get('bytes').decode())
# For Claude models
if chunk_data.get('type') == 'content_block_delta':
text = chunk_data.get('delta', {}).get('text', '')
print(text, end='', flush=True)
# Check for completion
if chunk_data.get('type') == 'message_stop':
print("\n[Stream completed]")
Advanced Streaming Example with Error Handling:
def stream_bedrock_response(bedrock_client, model_id, request_body):
"""
Stream response from Bedrock with proper error handling
"""
try:
response = bedrock_client.invoke_model_with_response_stream(
modelId=model_id,
body=json.dumps(request_body)
)
full_response = ""
stream = response.get('body')
for event in stream:
chunk = event.get('chunk')
if chunk:
chunk_data = json.loads(chunk.get('bytes').decode())
# Handle different event types
if chunk_data.get('type') == 'message_start':
print("[Streaming started]")
elif chunk_data.get('type') == 'content_block_start':
print("[Content block started]")
elif chunk_data.get('type') == 'content_block_delta':
delta = chunk_data.get('delta', {})
if delta.get('type') == 'text_delta':
text = delta.get('text', '')
full_response += text
print(text, end='', flush=True)
elif chunk_data.get('type') == 'content_block_stop':
print("\n[Content block completed]")
elif chunk_data.get('type') == 'message_delta':
# Contains usage statistics
usage = chunk_data.get('usage', {})
print(f"\n[Output tokens: {usage.get('output_tokens', 0)}]")
elif chunk_data.get('type') == 'message_stop':
print("[Stream completed]")
break
return full_response
except Exception as e:
print(f"Streaming error: {e}")
raise
# Usage
full_text = stream_bedrock_response(bedrock, model_id, request_body)
Streaming with Titan Models:
response = bedrock.invoke_model_with_response_stream(
modelId="amazon.titan-text-express-v1",
body=json.dumps({
"inputText": "Write a story about AI",
"textGenerationConfig": {
"maxTokenCount": 512,
"temperature": 0.7
}
})
)
for event in response.get('body'):
chunk = json.loads(event['chunk']['bytes'])
if 'outputText' in chunk:
print(chunk['outputText'], end='', flush=True)
Message Structure and Roles
Understanding how to structure messages is fundamental to working with Bedrock models effectively. Messages define the conversation flow and context.
Understanding Message Roles
Every message in a conversation has a role that identifies who is speaking. There are three primary roles:
1. User Role
- Represents the human user or application making requests
- Contains questions, instructions, or prompts
- Always required to start a conversation
{
"role": "user",
"content": [
{"text": "What is machine learning?"}
]
}
2. Assistant Role
- Represents the AI model's responses
- Contains the model's generated text
- Used when building multi-turn conversations
{
"role": "assistant",
"content": [
{"text": "Machine learning is a subset of artificial intelligence..."}
]
}
3. System Role (Special)
- Sets the model's behavior, personality, and constraints
- Not part of the messages array (separate parameter)
- Processed before any user/assistant messages
- Only one system prompt per request
# System prompt is separate from messages
system = [
{"text": "You are a helpful AI assistant specializing in Python programming."}
]
messages = [
{"role": "user", "content": [{"text": "How do I use decorators?"}]}
]
Message Content Structure
Messages use a structured content format that supports different content types:
Text Content (Most Common)
message = {
"role": "user",
"content": [
{
"text": "Explain quantum computing"
}
]
}
Multiple Content Blocks
You can include multiple content blocks in a single message:
message = {
"role": "user",
"content": [
{"text": "Here's my code:"},
{"text": "def hello():\n print('Hello')"},
{"text": "What does it do?"}
]
}
Image Content (Vision Models)
For models that support vision (like Claude 3):
import base64
# Read and encode image
with open("image.jpg", "rb") as f:
image_bytes = f.read()
image_base64 = base64.b64encode(image_bytes).decode('utf-8')
message = {
"role": "user",
"content": [
{
"image": {
"format": "jpeg", # or "png", "gif", "webp"
"source": {
"bytes": image_base64
}
}
},
{"text": "What's in this image?"}
]
}
Building Conversations
Conversations are built by alternating between user and assistant messages:
# Single turn conversation
messages = [
{
"role": "user",
"content": [{"text": "What is Python?"}]
}
]
# Multi-turn conversation
messages = [
# Turn 1
{
"role": "user",
"content": [{"text": "What is Python?"}]
},
{
"role": "assistant",
"content": [{"text": "Python is a high-level programming language..."}]
},
# Turn 2
{
"role": "user",
"content": [{"text": "What are its main features?"}]
},
{
"role": "assistant",
"content": [{"text": "Python's main features include..."}]
},
# Turn 3 (current)
{
"role": "user",
"content": [{"text": "Show me an example"}]
}
]
Message Validation Rules
Bedrock enforces specific rules for message structure:
Rule 1: Alternating Roles
Messages must alternate between user and assistant:
# ✅ VALID
messages = [
{"role": "user", "content": [{"text": "Hello"}]},
{"role": "assistant", "content": [{"text": "Hi there!"}]},
{"role": "user", "content": [{"text": "How are you?"}]}
]
# ❌ INVALID - Two user messages in a row
messages = [
{"role": "user", "content": [{"text": "Hello"}]},
{"role": "user", "content": [{"text": "Are you there?"}]}
]
Solution: Combine multiple user inputs into one message:
# ✅ VALID - Combined into single user message
messages = [
{
"role": "user",
"content": [
{"text": "Hello"},
{"text": "Are you there?"}
]
}
]
Rule 2: Start with User
Conversations must always start with a user message:
# ✅ VALID
messages = [
{"role": "user", "content": [{"text": "Hello"}]}
]
# ❌ INVALID - Starts with assistant
messages = [
{"role": "assistant", "content": [{"text": "Hello"}]}
]
Rule 3: End with User
The last message must be from the user (the one you want a response to):
# ✅ VALID - Ends with user message
messages = [
{"role": "user", "content": [{"text": "What is AI?"}]},
{"role": "assistant", "content": [{"text": "AI is..."}]},
{"role": "user", "content": [{"text": "Tell me more"}]}
]
# ❌ INVALID - Ends with assistant
messages = [
{"role": "user", "content": [{"text": "What is AI?"}]},
{"role": "assistant", "content": [{"text": "AI is..."}]}
]
# This would work, but you wouldn't get a new response
Rule 4: Content Must Not Be Empty
Every message must have at least one content block:
# ✅ VALID
{"role": "user", "content": [{"text": "Hello"}]}
# ❌ INVALID - Empty content
{"role": "user", "content": []}
Practical Message Management
Here's a helper class for managing message structure:
class MessageBuilder:
"""Helper class for building valid Bedrock message structures"""
def __init__(self):
self.messages = []
def add_user_message(self, text: str):
"""Add a user message"""
self.messages.append({
"role": "user",
"content": [{"text": text}]
})
return self
def add_assistant_message(self, text: str):
"""Add an assistant message"""
self.messages.append({
"role": "assistant",
"content": [{"text": text}]
})
return self
def add_user_message_with_image(self, text: str, image_base64: str, image_format: str = "jpeg"):
"""Add a user message with an image"""
self.messages.append({
"role": "user",
"content": [
{
"image": {
"format": image_format,
"source": {"bytes": image_base64}
}
},
{"text": text}
]
})
return self
def validate(self) -> bool:
"""Validate message structure"""
if not self.messages:
return False
# Must start with user
if self.messages[0]["role"] != "user":
return False
# Must end with user
if self.messages[-1]["role"] != "user":
return False
# Check alternating roles
for i in range(len(self.messages) - 1):
current_role = self.messages[i]["role"]
next_role = self.messages[i + 1]["role"]
if current_role == next_role:
return False
return True
def get_messages(self):
"""Get the messages list"""
if not self.validate():
raise ValueError("Invalid message structure")
return self.messages
def clear(self):
"""Clear all messages"""
self.messages = []
return self
# Usage
builder = MessageBuilder()
builder.add_user_message("What is Python?")
builder.add_assistant_message("Python is a programming language...")
builder.add_user_message("Show me an example")
messages = builder.get_messages()
System Prompts: Controlling Model Behavior
System prompts are one of the most powerful tools for controlling how AI models behave. They set the context, personality, and constraints for the entire conversation.
What Are System Prompts?
A system prompt is a special instruction given to the model before any user messages. It defines:
- Who the model is (role/persona)
- How it should behave (tone, style)
- What it should do (tasks, constraints)
- What it shouldn't do (limitations, boundaries)
Think of it as the model's "job description" for the conversation.
How System Prompts Work
System prompts are processed differently from regular messages:
# Traditional approach (InvokeModel with Claude)
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"system": "You are a helpful Python programming assistant.", # System prompt
"messages": [
{"role": "user", "content": "How do I use decorators?"}
],
"max_tokens": 1024
}
# Converse API approach
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
system=[
{"text": "You are a helpful Python programming assistant."}
],
messages=[
{"role": "user", "content": [{"text": "How do I use decorators?"}]}
]
)
Key characteristics: - System prompts are always processed first - They apply to the entire conversation - They don't count as a conversation turn - They strongly influence model behavior
System Prompt Best Practices
1. Be Specific and Clear
# ❌ Vague
system_prompt = "Be helpful."
# ✅ Specific
system_prompt = """You are a Python programming tutor.
When explaining concepts:
- Use simple language suitable for beginners
- Provide code examples for every concept
- Explain what each line of code does
- Suggest exercises for practice"""
2. Define the Role/Persona
# Customer service bot
system_prompt = """You are a friendly customer service representative for TechCorp.
- Always greet customers warmly
- Be patient and empathetic
- Provide clear, step-by-step solutions
- If you can't help, offer to escalate to a human agent
- Never make promises about refunds or replacements without checking policies"""
3. Set Boundaries and Constraints
system_prompt = """You are a medical information assistant.
What you CAN do:
- Provide general health information
- Explain medical terms
- Suggest when to see a doctor
What you CANNOT do:
- Diagnose conditions
- Prescribe medications
- Replace professional medical advice
Always remind users to consult healthcare professionals for personal medical advice."""
4. Specify Output Format
system_prompt = """You are a code review assistant.
For each code review, provide:
1. Overall assessment (Good/Needs Improvement/Poor)
2. Strengths (bullet points)
3. Issues found (bullet points with severity: Critical/Major/Minor)
4. Specific recommendations
5. Refactored code example (if needed)
Use markdown formatting for clarity."""
System Prompt Examples by Use Case
Code Assistant
system_prompt = """You are an expert software engineer specializing in Python, JavaScript, and system design.
Guidelines:
- Write clean, idiomatic code following best practices
- Include error handling and edge cases
- Add clear comments explaining complex logic
- Suggest performance optimizations when relevant
- Consider security implications
- Provide type hints for Python code
- Follow PEP 8 style guide for Python
- Use ES6+ features for JavaScript
When reviewing code:
- Point out bugs and potential issues
- Suggest improvements for readability and maintainability
- Explain the reasoning behind your suggestions"""
Content Writer
system_prompt = """You are a professional content writer specializing in technical blog posts.
Writing style:
- Clear and engaging tone
- Use active voice
- Short paragraphs (3-4 sentences max)
- Include relevant examples
- Add subheadings for structure
- Use bullet points for lists
- Write for a technical but not expert audience
Structure:
1. Compelling introduction with a hook
2. Main content with clear sections
3. Practical examples or code snippets
4. Key takeaways or conclusion
5. Call to action
Avoid:
- Jargon without explanation
- Overly long sentences
- Passive voice
- Fluff or filler content"""
Data Analyst
system_prompt = """You are a data analyst expert helping users understand and analyze data.
When analyzing data:
1. Start with summary statistics
2. Identify patterns and trends
3. Point out anomalies or outliers
4. Suggest relevant visualizations
5. Provide actionable insights
When writing code:
- Use pandas for data manipulation
- Use matplotlib/seaborn for visualization
- Include comments explaining each step
- Handle missing data appropriately
- Validate assumptions
Always explain your analytical approach and reasoning."""
Educational Tutor
system_prompt = """You are a patient and encouraging tutor for high school mathematics.
Teaching approach:
- Break down complex problems into simple steps
- Use analogies and real-world examples
- Check understanding before moving forward
- Encourage students when they struggle
- Celebrate correct answers
- Guide students to find answers rather than giving them directly
When a student makes a mistake:
- Don't just say it's wrong
- Help them identify where they went wrong
- Guide them to the correct approach
- Reinforce the underlying concept
Use encouraging language like:
- "Great start! Let's think about..."
- "You're on the right track..."
- "That's a common mistake, let's see why..."
"""
API Documentation Helper
system_prompt = """You are an API documentation expert helping developers understand and use APIs.
When explaining APIs:
1. Provide endpoint URL and HTTP method
2. List all parameters (required vs optional)
3. Show request example with sample data
4. Show response example with explanation
5. List possible error codes and meanings
6. Include authentication requirements
7. Provide code examples in multiple languages (Python, JavaScript, cURL)
Format responses as:
- Clear section headers
- Code blocks with syntax highlighting
- Tables for parameters
- Warning boxes for important notes
Always include working, copy-paste ready examples."""
System Prompt Management Strategies
Strategy 1: Template-Based System Prompts
Create reusable templates with placeholders:
class SystemPromptTemplates:
"""Reusable system prompt templates"""
CUSTOMER_SERVICE = """You are a {tone} customer service representative for {company_name}.
Product knowledge:
{product_info}
Policies:
{policies}
Response guidelines:
- Always greet customers by name if provided
- Be {tone} and professional
- Provide solutions within {response_time}
- Escalate to human if: {escalation_criteria}"""
CODE_REVIEWER = """You are a {language} code reviewer with {experience_level} expertise.
Focus areas:
{focus_areas}
Standards to enforce:
{coding_standards}
Severity levels:
- Critical: {critical_definition}
- Major: {major_definition}
- Minor: {minor_definition}"""
@staticmethod
def create_customer_service_prompt(company_name, tone="friendly", **kwargs):
return SystemPromptTemplates.CUSTOMER_SERVICE.format(
company_name=company_name,
tone=tone,
product_info=kwargs.get('product_info', 'General products'),
policies=kwargs.get('policies', 'Standard policies'),
response_time=kwargs.get('response_time', '24 hours'),
escalation_criteria=kwargs.get('escalation_criteria', 'Complex issues')
)
# Usage
prompt = SystemPromptTemplates.create_customer_service_prompt(
company_name="TechCorp",
tone="professional and empathetic",
product_info="Cloud hosting services",
policies="30-day money-back guarantee, 24/7 support"
)
Strategy 2: Layered System Prompts
Build complex prompts from modular components:
class SystemPromptBuilder:
"""Build system prompts from modular components"""
def __init__(self):
self.components = []
def add_role(self, role: str):
"""Define the model's role"""
self.components.append(f"You are {role}.")
return self
def add_expertise(self, areas: list):
"""Define areas of expertise"""
expertise = "Your areas of expertise include:\n" + "\n".join(f"- {area}" for area in areas)
self.components.append(expertise)
return self
def add_guidelines(self, guidelines: list):
"""Add behavioral guidelines"""
guide_text = "Guidelines:\n" + "\n".join(f"- {g}" for g in guidelines)
self.components.append(guide_text)
return self
def add_constraints(self, constraints: list):
"""Add constraints/limitations"""
constraint_text = "Constraints:\n" + "\n".join(f"- {c}" for c in constraints)
self.components.append(constraint_text)
return self
def add_output_format(self, format_description: str):
"""Specify output format"""
self.components.append(f"Output format:\n{format_description}")
return self
def add_examples(self, examples: list):
"""Add example interactions"""
example_text = "Examples:\n" + "\n\n".join(examples)
self.components.append(example_text)
return self
def build(self) -> str:
"""Build the final system prompt"""
return "\n\n".join(self.components)
# Usage
prompt = (SystemPromptBuilder()
.add_role("an expert Python developer and teacher")
.add_expertise([
"Python programming (beginner to advanced)",
"Web development with Django and Flask",
"Data science with pandas and numpy",
"Best practices and design patterns"
])
.add_guidelines([
"Explain concepts clearly with examples",
"Write clean, well-commented code",
"Consider edge cases and error handling",
"Suggest best practices and optimizations"
])
.add_constraints([
"Only provide Python 3.8+ compatible code",
"Avoid deprecated features",
"Don't use external libraries unless necessary"
])
.add_output_format("""
1. Brief explanation of the concept
2. Code example with comments
3. Expected output
4. Common pitfalls to avoid
""")
.build()
)
print(prompt)
Strategy 3: Dynamic System Prompts
Adjust system prompts based on context:
class DynamicSystemPrompts:
"""Generate context-aware system prompts"""
@staticmethod
def for_user_level(user_level: str, domain: str):
"""Generate prompt based on user expertise level"""
level_configs = {
"beginner": {
"tone": "patient and encouraging",
"detail": "Explain every step in detail",
"examples": "Use simple, relatable examples",
"jargon": "Avoid technical jargon or explain it clearly"
},
"intermediate": {
"tone": "professional and informative",
"detail": "Provide clear explanations with some technical depth",
"examples": "Use practical, real-world examples",
"jargon": "Use technical terms but explain complex ones"
},
"advanced": {
"tone": "technical and precise",
"detail": "Focus on advanced concepts and edge cases",
"examples": "Use sophisticated examples and best practices",
"jargon": "Use technical terminology freely"
}
}
config = level_configs.get(user_level, level_configs["intermediate"])
return f"""You are a {config['tone']} {domain} expert.
Communication style:
- {config['detail']}
- {config['examples']}
- {config['jargon']}
Adjust your responses to match the {user_level} level of expertise."""
@staticmethod
def for_task_type(task_type: str):
"""Generate prompt based on task type"""
task_prompts = {
"debug": """You are a debugging expert.
Approach:
1. Analyze the error message carefully
2. Identify the root cause
3. Explain why the error occurred
4. Provide the fix with explanation
5. Suggest how to prevent similar issues
Be systematic and thorough.""",
"optimize": """You are a performance optimization expert.
Approach:
1. Analyze current implementation
2. Identify bottlenecks
3. Suggest optimizations with trade-offs
4. Provide benchmarking approach
5. Consider scalability
Focus on measurable improvements.""",
"design": """You are a software architecture expert.
Approach:
1. Understand requirements thoroughly
2. Consider scalability and maintainability
3. Suggest design patterns where appropriate
4. Discuss trade-offs of different approaches
5. Provide clear diagrams or pseudocode
Think long-term and holistically."""
}
return task_prompts.get(task_type, "You are a helpful assistant.")
# Usage
prompt = DynamicSystemPrompts.for_user_level("beginner", "Python programming")
# or
prompt = DynamicSystemPrompts.for_task_type("debug")
Advanced System Prompt Techniques
Technique 1: Few-Shot Examples in System Prompts
Include examples of desired behavior:
system_prompt = """You are a sentiment analysis assistant.
Analyze the sentiment of user messages and respond in this exact format:
Example 1:
User: "I love this product! It's amazing!"
Analysis: Positive (confidence: 95%)
Key emotions: joy, satisfaction
Tone: enthusiastic
Example 2:
User: "This is terrible. Worst purchase ever."
Analysis: Negative (confidence: 98%)
Key emotions: anger, disappointment
Tone: frustrated
Example 3:
User: "It's okay, I guess. Nothing special."
Analysis: Neutral (confidence: 75%)
Key emotions: indifference
Tone: lukewarm
Now analyze user messages following this exact format."""
Technique 2: Chain-of-Thought Prompting
Encourage step-by-step reasoning:
system_prompt = """You are a math problem solver.
For every problem, follow this thinking process:
1. UNDERSTAND: Restate the problem in your own words
2. PLAN: Identify what approach or formula to use
3. SOLVE: Work through the solution step-by-step
4. CHECK: Verify your answer makes sense
Show your work for each step. Think out loud.
Example:
Problem: "If a train travels 120 miles in 2 hours, what's its average speed?"
UNDERSTAND: We need to find average speed given distance and time.
PLAN: Use the formula: speed = distance / time
SOLVE:
- Distance = 120 miles
- Time = 2 hours
- Speed = 120 / 2 = 60 miles per hour
CHECK: 60 mph × 2 hours = 120 miles ✓
Answer: 60 miles per hour"""
Technique 3: Role-Playing with Constraints
Create specific personas with detailed constraints:
system_prompt = """You are Sherlock Holmes, the famous detective.
Personality traits:
- Highly observant and analytical
- Sometimes condescending but well-meaning
- Uses deductive reasoning
- References obscure knowledge
- Speaks in Victorian English style
When analyzing problems:
- Point out details others miss
- Make logical deductions
- Explain your reasoning process
- Occasionally reference past cases
- Show confidence in your conclusions
Speech patterns:
- "Elementary, my dear Watson"
- "I observe that..."
- "It is quite evident that..."
- "The facts are these..."
Stay in character at all times."""
Technique 4: Structured Output Enforcement
Force specific output structures:
system_prompt = """You are a code review bot.
You MUST respond in this exact JSON structure:
{
"overall_score": <number 1-10>,
"summary": "<one sentence summary>",
"strengths": [
"<strength 1>",
"<strength 2>"
],
"issues": [
{
"severity": "<critical|major|minor>",
"line": <line number>,
"description": "<issue description>",
"suggestion": "<how to fix>"
}
],
"recommendations": [
"<recommendation 1>",
"<recommendation 2>"
]
}
Do not include any text outside this JSON structure.
Ensure the JSON is valid and properly formatted."""
Converse API: Unified Interface for All Models
The Converse API is a newer, standardized way to interact with Bedrock models. Instead of dealing with model-specific request formats, you use a single, consistent interface that works across all models.
Why Use Converse API?
Traditional approach (InvokeModel): - Each model has different request/response formats - You need to know Claude's format vs Titan's format vs Llama's format - Switching models requires code changes - More complex to maintain
Converse API approach: - Single, unified format for all models - Switch models by just changing the model ID - Cleaner, more maintainable code - Built-in support for multi-turn conversations - Automatic handling of system prompts and tool use
Basic Converse API Usage
# Simple conversation with any model
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [
{"text": "What is machine learning?"}
]
}
],
inferenceConfig={
"maxTokens": 512,
"temperature": 0.7,
"topP": 0.9
}
)
# Extract the response
output_message = response['output']['message']
response_text = output_message['content'][0]['text']
print(response_text)
# Check token usage
usage = response['usage']
print(f"Input tokens: {usage['inputTokens']}")
print(f"Output tokens: {usage['outputTokens']}")
print(f"Total tokens: {usage['totalTokens']}")
Multi-Turn Conversations
The Converse API makes it easy to maintain conversation history:
# Build a conversation
conversation_history = []
# First turn
conversation_history.append({
"role": "user",
"content": [{"text": "What is Python?"}]
})
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=conversation_history
)
# Add assistant's response to history
assistant_message = response['output']['message']
conversation_history.append(assistant_message)
print(f"Assistant: {assistant_message['content'][0]['text']}")
# Second turn - model remembers context
conversation_history.append({
"role": "user",
"content": [{"text": "What are its main features?"}]
})
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=conversation_history
)
assistant_message = response['output']['message']
print(f"Assistant: {assistant_message['content'][0]['text']}")
System Prompts with Converse API
System prompts set the behavior and personality of the model:
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "Explain recursion"}]
}
],
system=[
{"text": "You are an expert computer science teacher. Explain concepts using simple analogies and examples."}
],
inferenceConfig={
"maxTokens": 1000,
"temperature": 0.7
}
)
Converse API with Streaming
Stream responses for better user experience:
def converse_stream(bedrock_client, model_id, messages, inference_config=None):
"""
Stream responses using Converse API
"""
response = bedrock_client.converse_stream(
modelId=model_id,
messages=messages,
inferenceConfig=inference_config or {
"maxTokens": 2048,
"temperature": 0.7
}
)
full_text = ""
# Process the stream
for event in response['stream']:
# Content block delta - this contains the actual text
if 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
text = delta['text']
full_text += text
print(text, end='', flush=True)
# Metadata about the response
elif 'metadata' in event:
metadata = event['metadata']
if 'usage' in metadata:
usage = metadata['usage']
print(f"\n\n[Tokens used: {usage['totalTokens']}]")
# Message stop - end of response
elif 'messageStop' in event:
stop_reason = event['messageStop']['stopReason']
print(f"\n[Stopped: {stop_reason}]")
return full_text
# Usage
messages = [
{
"role": "user",
"content": [{"text": "Write a short story about a robot"}]
}
]
result = converse_stream(
bedrock,
"anthropic.claude-3-sonnet-20240229-v1:0",
messages,
{"maxTokens": 1500, "temperature": 0.8}
)
Inference Configuration in Converse API
The inferenceConfig parameter standardizes inference parameters across all models:
inference_config = {
"maxTokens": 2048, # Maximum tokens to generate
"temperature": 0.7, # Randomness (0.0-1.0)
"topP": 0.9, # Nucleus sampling (0.0-1.0)
"stopSequences": ["\n\n", "END"] # Stop generation triggers
}
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=messages,
inferenceConfig=inference_config
)
Note: Not all parameters are supported by all models. The Converse API handles this gracefully by using what's available.
Complete Converse API Example
import boto3
import json
class ConverseClient:
"""
A clean wrapper around Bedrock's Converse API
"""
def __init__(self, region_name='us-east-1'):
self.client = boto3.client(
service_name='bedrock-runtime',
region_name=region_name
)
self.conversation_history = []
def send_message(
self,
message: str,
model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0",
system_prompt: str = None,
temperature: float = 0.7,
max_tokens: int = 2048,
stream: bool = False
):
"""
Send a message and get a response
"""
# Add user message to history
self.conversation_history.append({
"role": "user",
"content": [{"text": message}]
})
# Prepare request parameters
request_params = {
"modelId": model_id,
"messages": self.conversation_history,
"inferenceConfig": {
"maxTokens": max_tokens,
"temperature": temperature,
"topP": 0.9
}
}
# Add system prompt if provided
if system_prompt:
request_params["system"] = [{"text": system_prompt}]
# Choose streaming or non-streaming
if stream:
return self._stream_response(request_params)
else:
return self._get_response(request_params)
def _get_response(self, request_params):
"""Get complete response at once"""
response = self.client.converse(**request_params)
# Extract response text
assistant_message = response['output']['message']
response_text = assistant_message['content'][0]['text']
# Add to conversation history
self.conversation_history.append(assistant_message)
# Return response with metadata
return {
'text': response_text,
'usage': response['usage'],
'stop_reason': response['stopReason']
}
def _stream_response(self, request_params):
"""Stream response in real-time"""
response = self.client.converse_stream(**request_params)
full_text = ""
usage_info = None
stop_reason = None
for event in response['stream']:
if 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
text = delta['text']
full_text += text
print(text, end='', flush=True)
elif 'metadata' in event:
if 'usage' in event['metadata']:
usage_info = event['metadata']['usage']
elif 'messageStop' in event:
stop_reason = event['messageStop']['stopReason']
print() # New line after streaming
# Add assistant response to history
self.conversation_history.append({
"role": "assistant",
"content": [{"text": full_text}]
})
return {
'text': full_text,
'usage': usage_info,
'stop_reason': stop_reason
}
def reset_conversation(self):
"""Clear conversation history"""
self.conversation_history = []
def get_history(self):
"""Get current conversation history"""
return self.conversation_history
# Usage Example
if __name__ == "__main__":
# Create client
client = ConverseClient()
# Set system prompt for the conversation
system_prompt = "You are a helpful AI assistant specializing in Python programming."
# First message
response = client.send_message(
"What is a decorator in Python?",
system_prompt=system_prompt,
temperature=0.5
)
print(f"Assistant: {response['text']}")
print(f"Tokens used: {response['usage']['totalTokens']}\n")
# Follow-up message (context is maintained)
response = client.send_message(
"Can you show me an example?",
temperature=0.5
)
print(f"Assistant: {response['text']}")
print(f"Tokens used: {response['usage']['totalTokens']}\n")
# Stream a response
print("Assistant: ", end='')
response = client.send_message(
"Explain how decorators work internally",
temperature=0.5,
stream=True
)
print(f"\nTokens used: {response['usage']['totalTokens']}")
Converse API vs InvokeModel: When to Use Each
Use Converse API when: - Building conversational applications - You want model-agnostic code - You need multi-turn conversation support - You want cleaner, more maintainable code - You're starting a new project
Use InvokeModel when: - You need model-specific features not in Converse API - You're working with existing code - You need maximum control over request format - You're using advanced model-specific parameters
Model Compatibility
The Converse API works with: - ✅ Anthropic Claude 3 (Opus, Sonnet, Haiku) - ✅ Anthropic Claude 2.x - ✅ Amazon Titan Text models - ✅ Meta Llama 2 and 3 - ✅ Mistral AI models - ✅ Cohere Command models
Check the AWS documentation for the latest compatibility list.
Guardrails: Content Filtering and Safety
AWS Bedrock Guardrails help you implement safeguards for your generative AI applications by filtering harmful content, blocking sensitive information, and enforcing responsible AI practices.
What Are Guardrails?
Guardrails are policies that you define and apply to your Bedrock model interactions to:
- Filter harmful content: Block hate speech, violence, sexual content, etc.
- Protect sensitive data: Prevent PII (Personally Identifiable Information) leakage
- Enforce topic restrictions: Keep conversations on approved topics
- Block denied words: Filter specific words or phrases
- Validate content quality: Ensure responses meet quality standards
Key benefits: - Centralized policy management - Consistent enforcement across all models - Real-time content filtering - Detailed intervention logging - Compliance with regulations
Types of Guardrails
1. Content Filters
Filter harmful content across multiple categories:
content_filters = [
{
"type": "HATE", # Hate speech, discrimination
"inputStrength": "HIGH", # HIGH, MEDIUM, LOW, NONE
"outputStrength": "HIGH"
},
{
"type": "INSULTS", # Insults, bullying
"inputStrength": "MEDIUM",
"outputStrength": "HIGH"
},
{
"type": "SEXUAL", # Sexual content
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "VIOLENCE", # Violence, gore
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "MISCONDUCT", # Criminal activity, illegal content
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "PROMPT_ATTACK", # Jailbreak attempts, prompt injection
"inputStrength": "HIGH",
"outputStrength": "NONE"
}
]
Strength levels: - HIGH: Strictest filtering, blocks most content in category - MEDIUM: Balanced filtering, blocks obvious violations - LOW: Minimal filtering, only extreme cases - NONE: No filtering for this category
2. Sensitive Information Filters (PII)
Protect personally identifiable information:
pii_filters = [
{
"type": "EMAIL",
"action": "BLOCK" # or "ANONYMIZE"
},
{
"type": "PHONE",
"action": "ANONYMIZE"
},
{
"type": "NAME",
"action": "ANONYMIZE"
},
{
"type": "ADDRESS",
"action": "BLOCK"
},
{
"type": "SSN", # Social Security Number
"action": "BLOCK"
},
{
"type": "CREDIT_DEBIT_CARD_NUMBER",
"action": "BLOCK"
},
{
"type": "IP_ADDRESS",
"action": "ANONYMIZE"
},
{
"type": "DRIVER_ID",
"action": "BLOCK"
},
{
"type": "PASSPORT_NUMBER",
"action": "BLOCK"
},
{
"type": "USERNAME",
"action": "ANONYMIZE"
},
{
"type": "PASSWORD",
"action": "BLOCK"
}
]
Actions:
- BLOCK: Reject the request/response entirely
- ANONYMIZE: Replace with placeholder (e.g., [EMAIL], [PHONE])
3. Denied Topics
Restrict conversations to approved topics:
denied_topics = [
{
"name": "Financial Advice",
"definition": "Providing specific investment recommendations, stock tips, or personalized financial planning advice",
"examples": [
"Should I invest in Bitcoin?",
"What stocks should I buy?",
"How should I allocate my 401k?"
],
"type": "DENY"
},
{
"name": "Medical Diagnosis",
"definition": "Diagnosing medical conditions or prescribing treatments",
"examples": [
"Do I have cancer?",
"What medication should I take for my headache?",
"Is this rash serious?"
],
"type": "DENY"
},
{
"name": "Legal Advice",
"definition": "Providing specific legal counsel or representation",
"examples": [
"Should I sue my employer?",
"How do I file for bankruptcy?",
"What should I say in court?"
],
"type": "DENY"
}
]
4. Word Filters (Profanity/Custom)
Block specific words or phrases:
word_filters = [
{
"text": "badword1"
},
{
"text": "inappropriate phrase"
},
{
"text": "competitor-name"
}
]
Creating Guardrails
Guardrails are created using the Bedrock control plane API:
import boto3
import json
# Create Bedrock client for control plane
bedrock_client = boto3.client('bedrock', region_name='us-east-1')
# Define guardrail configuration
guardrail_config = {
"name": "my-app-guardrail",
"description": "Guardrail for customer-facing chatbot",
"topicPolicyConfig": {
"topicsConfig": [
{
"name": "Medical Advice",
"definition": "Providing medical diagnoses or treatment recommendations",
"examples": [
"Do I have diabetes?",
"What medicine should I take?"
],
"type": "DENY"
},
{
"name": "Financial Advice",
"definition": "Providing specific investment or financial planning advice",
"examples": [
"Should I buy this stock?",
"How should I invest my money?"
],
"type": "DENY"
}
]
},
"contentPolicyConfig": {
"filtersConfig": [
{
"type": "HATE",
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "INSULTS",
"inputStrength": "MEDIUM",
"outputStrength": "HIGH"
},
{
"type": "SEXUAL",
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "VIOLENCE",
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "MISCONDUCT",
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "PROMPT_ATTACK",
"inputStrength": "HIGH",
"outputStrength": "NONE"
}
]
},
"sensitiveInformationPolicyConfig": {
"piiEntitiesConfig": [
{
"type": "EMAIL",
"action": "ANONYMIZE"
},
{
"type": "PHONE",
"action": "ANONYMIZE"
},
{
"type": "NAME",
"action": "ANONYMIZE"
},
{
"type": "SSN",
"action": "BLOCK"
},
{
"type": "CREDIT_DEBIT_CARD_NUMBER",
"action": "BLOCK"
}
]
},
"wordPolicyConfig": {
"wordsConfig": [
{"text": "badword1"},
{"text": "badword2"}
],
"managedWordListsConfig": [
{"type": "PROFANITY"} # Use AWS managed profanity list
]
},
"blockedInputMessaging": "I cannot process requests containing inappropriate content. Please rephrase your message.",
"blockedOutputsMessaging": "I cannot provide a response to this request as it violates our content policy."
}
# Create the guardrail
response = bedrock_client.create_guardrail(**guardrail_config)
guardrail_id = response['guardrailId']
guardrail_version = response['version']
print(f"Guardrail created: {guardrail_id}")
print(f"Version: {guardrail_version}")
Applying Guardrails to API Calls
Once created, apply guardrails to your model invocations:
With InvokeModel
import boto3
import json
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "Your prompt here"
}
]
}
response = bedrock_runtime.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps(request_body),
guardrailIdentifier="your-guardrail-id", # Add guardrail
guardrailVersion="1" # or "DRAFT"
)
response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])
With Converse API
response = bedrock_runtime.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "Your prompt here"}]
}
],
inferenceConfig={
"maxTokens": 1024,
"temperature": 0.7
},
guardrailConfig={
"guardrailIdentifier": "your-guardrail-id",
"guardrailVersion": "1",
"trace": "enabled" # Enable detailed trace information
}
)
With Streaming
response = bedrock_runtime.converse_stream(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "Your prompt here"}]
}
],
guardrailConfig={
"guardrailIdentifier": "your-guardrail-id",
"guardrailVersion": "1",
"trace": "enabled"
}
)
for event in response['stream']:
if 'contentBlockDelta' in event:
print(event['contentBlockDelta']['delta']['text'], end='', flush=True)
Guardrail Configuration Examples
Example 1: Customer Service Bot
customer_service_guardrail = {
"name": "customer-service-guardrail",
"description": "Guardrail for customer service chatbot",
"contentPolicyConfig": {
"filtersConfig": [
{"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "INSULTS", "inputStrength": "MEDIUM", "outputStrength": "HIGH"},
{"type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"}
]
},
"sensitiveInformationPolicyConfig": {
"piiEntitiesConfig": [
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "PHONE", "action": "ANONYMIZE"},
{"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"},
{"type": "SSN", "action": "BLOCK"}
]
},
"topicPolicyConfig": {
"topicsConfig": [
{
"name": "Competitor Discussion",
"definition": "Discussing or comparing with competitor products",
"examples": ["How do you compare to CompetitorX?"],
"type": "DENY"
}
]
},
"blockedInputMessaging": "I'm here to help with our products and services. Please keep the conversation respectful.",
"blockedOutputsMessaging": "I apologize, but I cannot provide that information. How else can I assist you?"
}
Example 2: Educational Platform
education_guardrail = {
"name": "education-platform-guardrail",
"description": "Guardrail for K-12 educational platform",
"contentPolicyConfig": {
"filtersConfig": [
{"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "INSULTS", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "MISCONDUCT", "inputStrength": "HIGH", "outputStrength": "HIGH"}
]
},
"wordPolicyConfig": {
"managedWordListsConfig": [
{"type": "PROFANITY"}
]
},
"topicPolicyConfig": {
"topicsConfig": [
{
"name": "Inappropriate Content",
"definition": "Content not suitable for K-12 students",
"examples": [
"How to cheat on tests",
"Inappropriate jokes"
],
"type": "DENY"
}
]
},
"blockedInputMessaging": "Let's keep our conversation educational and appropriate. How can I help you learn today?",
"blockedOutputsMessaging": "I can't help with that, but I'd be happy to help you with your studies!"
}
Example 3: Healthcare Information Bot
healthcare_guardrail = {
"name": "healthcare-info-guardrail",
"description": "Guardrail for healthcare information (non-diagnostic)",
"contentPolicyConfig": {
"filtersConfig": [
{"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"}
]
},
"sensitiveInformationPolicyConfig": {
"piiEntitiesConfig": [
{"type": "NAME", "action": "ANONYMIZE"},
{"type": "SSN", "action": "BLOCK"},
{"type": "PHONE", "action": "ANONYMIZE"},
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "ADDRESS", "action": "ANONYMIZE"}
]
},
"topicPolicyConfig": {
"topicsConfig": [
{
"name": "Medical Diagnosis",
"definition": "Attempting to diagnose medical conditions",
"examples": [
"Do I have cancer?",
"What disease do I have?",
"Am I sick?"
],
"type": "DENY"
},
{
"name": "Prescription Advice",
"definition": "Recommending specific medications or treatments",
"examples": [
"What medication should I take?",
"Should I stop taking my medicine?",
"What's the right dosage?"
],
"type": "DENY"
}
]
},
"blockedInputMessaging": "I can provide general health information, but I cannot diagnose conditions or prescribe treatments. Please consult a healthcare professional.",
"blockedOutputsMessaging": "For medical advice specific to your situation, please consult with a qualified healthcare provider."
}
Handling Guardrail Interventions
When a guardrail blocks content, you receive specific information about the intervention:
def invoke_with_guardrail_handling(bedrock_client, model_id, messages, guardrail_id):
"""
Invoke model with comprehensive guardrail error handling
"""
try:
response = bedrock_client.converse(
modelId=model_id,
messages=messages,
guardrailConfig={
"guardrailIdentifier": guardrail_id,
"guardrailVersion": "1",
"trace": "enabled"
}
)
# Check if guardrail intervened
if 'trace' in response:
trace = response['trace']
if 'guardrail' in trace:
guardrail_trace = trace['guardrail']
# Input was blocked
if guardrail_trace.get('inputAssessment'):
input_assessment = guardrail_trace['inputAssessment']
# Content policy violations
if 'contentPolicy' in input_assessment:
filters = input_assessment['contentPolicy']['filters']
for filter_item in filters:
if filter_item['action'] == 'BLOCKED':
print(f"Input blocked - {filter_item['type']}: {filter_item['confidence']}")
# Topic policy violations
if 'topicPolicy' in input_assessment:
topics = input_assessment['topicPolicy']['topics']
for topic in topics:
if topic['action'] == 'BLOCKED':
print(f"Input blocked - Topic: {topic['name']}")
# PII detected
if 'sensitiveInformationPolicy' in input_assessment:
pii_entities = input_assessment['sensitiveInformationPolicy']['piiEntities']
for entity in pii_entities:
print(f"PII detected: {entity['type']} - Action: {entity['action']}")
# Output was blocked
if guardrail_trace.get('outputAssessment'):
output_assessment = guardrail_trace['outputAssessment']
print("Output was blocked by guardrail")
return response
except bedrock_client.exceptions.ValidationException as e:
print(f"Validation error: {e}")
return None
except Exception as e:
print(f"Error: {e}")
return None
# Usage
response = invoke_with_guardrail_handling(
bedrock_runtime,
"anthropic.claude-3-sonnet-20240229-v1:0",
[{"role": "user", "content": [{"text": "Your message"}]}],
"your-guardrail-id"
)
Guardrails Best Practices
1. Start with Moderate Settings
# Don't start with all HIGH settings
# ❌ Too restrictive
{"type": "INSULTS", "inputStrength": "HIGH", "outputStrength": "HIGH"}
# ✅ Start balanced, adjust based on testing
{"type": "INSULTS", "inputStrength": "MEDIUM", "outputStrength": "HIGH"}
2. Test Thoroughly
test_cases = [
# Legitimate use cases
"How do I reset my password?",
"What are your business hours?",
# Edge cases
"My email is john@example.com, can you help?",
"I'm frustrated with this service",
# Should be blocked
"You're terrible at your job",
"Tell me how to hack a system"
]
for test in test_cases:
print(f"\nTesting: {test}")
response = invoke_with_guardrail(test)
print(f"Result: {response}")
3. Use Appropriate PII Actions
# For customer service - anonymize to maintain context
{"type": "EMAIL", "action": "ANONYMIZE"} # Becomes [EMAIL]
{"type": "PHONE", "action": "ANONYMIZE"} # Becomes [PHONE]
# For sensitive data - block completely
{"type": "SSN", "action": "BLOCK"}
{"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"}
{"type": "PASSWORD", "action": "BLOCK"}
4. Provide Clear Blocked Messages
# ❌ Vague
"blockedInputMessaging": "Request blocked."
# ✅ Helpful and clear
"blockedInputMessaging": "I cannot process requests with inappropriate content. Please rephrase your message respectfully, and I'll be happy to help."
# ✅ Specific to use case
"blockedInputMessaging": "For your privacy and security, I cannot process messages containing sensitive personal information like credit card numbers or social security numbers."
5. Version Your Guardrails
# Create new version for changes
response = bedrock_client.create_guardrail_version(
guardrailIdentifier="your-guardrail-id",
description="Added financial advice topic restriction"
)
# Test new version before promoting
guardrail_config = {
"guardrailIdentifier": "your-guardrail-id",
"guardrailVersion": "DRAFT", # Test with DRAFT first
"trace": "enabled"
}
# After testing, use specific version in production
guardrail_config = {
"guardrailIdentifier": "your-guardrail-id",
"guardrailVersion": "2", # Stable version
"trace": "enabled"
}
6. Monitor and Iterate
class GuardrailMonitor:
"""Monitor guardrail interventions and adjust policies"""
def __init__(self):
self.interventions = []
def log_intervention(self, intervention_type, details):
"""Log when guardrail blocks content"""
self.interventions.append({
"timestamp": datetime.now(),
"type": intervention_type,
"details": details
})
def get_statistics(self):
"""Analyze intervention patterns"""
stats = {}
for intervention in self.interventions:
type_key = intervention['type']
stats[type_key] = stats.get(type_key, 0) + 1
return stats
def identify_false_positives(self):
"""Flag potential false positives for review"""
# Implement logic to identify patterns
# that might indicate overly strict filtering
pass
# Usage
monitor = GuardrailMonitor()
# In your application
response = invoke_with_guardrail(message)
if response.get('blocked'):
monitor.log_intervention(
response['block_reason'],
{"message": message, "assessment": response['assessment']}
)
# Periodically review
print(monitor.get_statistics())
Monitoring and Logging Guardrails
Enable CloudWatch logging for guardrail activity:
import boto3
logs_client = boto3.client('logs', region_name='us-east-1')
# Create log group for guardrail monitoring
log_group_name = '/aws/bedrock/guardrails'
try:
logs_client.create_log_group(logGroupName=log_group_name)
print(f"Log group created: {log_group_name}")
except logs_client.exceptions.ResourceAlreadyExistsException:
print(f"Log group already exists: {log_group_name}")
# Query guardrail logs
def query_guardrail_logs(start_time, end_time):
"""Query CloudWatch logs for guardrail interventions"""
query = """
fields @timestamp, guardrailId, action, policyType, @message
| filter action = "BLOCKED"
| sort @timestamp desc
| limit 100
"""
response = logs_client.start_query(
logGroupName=log_group_name,
startTime=int(start_time.timestamp()),
endTime=int(end_time.timestamp()),
queryString=query
)
query_id = response['queryId']
# Wait for query to complete
import time
while True:
result = logs_client.get_query_results(queryId=query_id)
if result['status'] == 'Complete':
return result['results']
time.sleep(1)
# Usage
from datetime import datetime, timedelta
end_time = datetime.now()
start_time = end_time - timedelta(hours=24)
blocked_requests = query_guardrail_logs(start_time, end_time)
print(f"Blocked requests in last 24 hours: {len(blocked_requests)}")
Key metrics to monitor: - Total interventions by type (content, topic, PII, word) - False positive rate - User experience impact - Most common blocked topics - PII detection frequency
Guardrails provide essential safety and compliance features for production AI applications. Start with moderate settings, test thoroughly, and iterate based on real-world usage patterns.
Inference Parameters by Model
Anthropic Claude Models
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048, # Maximum tokens to generate (required)
"temperature": 0.7, # Randomness (0.0-1.0)
"top_p": 0.9, # Nucleus sampling (0.0-1.0)
"top_k": 250, # Top-k sampling
"stop_sequences": ["\n\n"], # Stop generation at these sequences
"messages": [
{
"role": "user",
"content": "Your prompt here"
}
],
"system": "You are a helpful assistant" # Optional system prompt
}
Parameter Details:
- max_tokens (required): Maximum number of tokens to generate (1-4096 depending on model)
- temperature: Controls randomness. Lower = more focused, higher = more creative
- top_p: Cumulative probability for nucleus sampling
- top_k: Limits vocabulary to top K tokens
- stop_sequences: Array of strings that stop generation when encountered
Amazon Titan Text Models
request_body = {
"inputText": "Your prompt here",
"textGenerationConfig": {
"maxTokenCount": 512, # Max tokens (0-8192)
"temperature": 0.7, # Randomness (0.0-1.0)
"topP": 0.9, # Nucleus sampling (0.0-1.0)
"stopSequences": [] # Stop sequences
}
}
Cohere Command Models
request_body = {
"prompt": "Your prompt here",
"max_tokens": 512, # Max tokens to generate
"temperature": 0.7, # Randomness (0.0-5.0)
"p": 0.9, # Nucleus sampling (0.0-1.0)
"k": 0, # Top-k sampling (0-500)
"stop_sequences": [], # Stop sequences
"return_likelihoods": "NONE" # NONE, GENERATION, ALL
}
AI21 Jurassic Models
request_body = {
"prompt": "Your prompt here",
"maxTokens": 512, # Max tokens (1-8191)
"temperature": 0.7, # Randomness (0.0-1.0)
"topP": 0.9, # Nucleus sampling
"stopSequences": [], # Stop sequences
"countPenalty": {
"scale": 0
},
"presencePenalty": {
"scale": 0
},
"frequencyPenalty": {
"scale": 0
}
}
Meta Llama Models
request_body = {
"prompt": "Your prompt here",
"max_gen_len": 512, # Max tokens to generate
"temperature": 0.7, # Randomness (0.0-1.0)
"top_p": 0.9 # Nucleus sampling (0.0-1.0)
}
Understanding Tokens
Before diving into inference parameters, it's essential to understand what tokens are, as they're fundamental to how language models work and how you're billed.
What Are Tokens?
Tokens are the basic units that language models read and generate. Think of them as pieces of words:
- A token can be a whole word:
"hello"= 1 token - Or part of a word:
"understanding"= 3 tokens (under,stand,ing) - Or a character:
"🎨"= 1-2 tokens - Spaces and punctuation are also tokens
Examples:
"Hello, world!" = 4 tokens ["Hello", ",", " world", "!"]
"ChatGPT is amazing" = 5 tokens ["Chat", "G", "PT", " is", " amazing"]
"I'm learning AI" = 5 tokens ["I", "'m", " learning", " AI"]
Why Tokens Matter
- Cost: You're charged per token (input + output)
- Context Limits: Models have maximum token limits (e.g., 200K tokens for Claude 3)
- Performance: More tokens = longer processing time
- Quality: Token limits affect how much context you can provide
Token Estimation
As a rough guide: - 1 token ≈ 4 characters in English - 1 token ≈ ¾ of a word - 100 tokens ≈ 75 words - 1,000 tokens ≈ 750 words
Practical Example:
# A typical conversation:
prompt = "Explain quantum computing" # ~4 tokens
response = "Quantum computing uses quantum mechanics..." # ~500 tokens
total_tokens = 504 # This is what you're billed for
Managing Token Usage
# Always set max_tokens to control costs
request_body = {
"max_tokens": 500, # Limit response length
"messages": [{"role": "user", "content": prompt}]
}
# Monitor token usage in responses
response_body = json.loads(response['body'].read())
usage = response_body.get('usage', {})
print(f"Input tokens: {usage.get('input_tokens')}")
print(f"Output tokens: {usage.get('output_tokens')}")
print(f"Total cost: ${(usage.get('input_tokens') * 0.003 + usage.get('output_tokens') * 0.015) / 1000}")
Common Inference Parameters Explained
Inference parameters control how the model generates text. Understanding these is key to getting the outputs you want.
Temperature: Controlling Randomness
What it does: Temperature controls the randomness of the model's predictions.
How it works: When a model generates the next token, it calculates probabilities for all possible tokens. Temperature adjusts these probabilities:
Low temperature (0.0-0.3): Sharpens the probability distribution
- The model becomes more confident and deterministic
- Always picks the most likely token
- Output is consistent and predictable
High temperature (0.8-1.0+): Flattens the probability distribution
- The model considers more options
- Less likely tokens get a chance
- Output is creative and varied
Visual Example:
Next token probabilities at different temperatures:
Temperature 0.1 (Focused):
"the" → 85% ████████████████████
"a" → 10% ███
"an" → 5% ██
Temperature 1.0 (Creative):
"the" → 40% ████████
"a" → 30% ██████
"an" → 20% ████
"my" → 10% ██
When to use:
# Factual tasks: Use LOW temperature (0.0-0.3)
# - Answering questions
# - Summarization
# - Translation
# - Code generation
request_body = {
"temperature": 0.1,
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}
# Output: "The capital of France is Paris." (consistent every time)
# Creative tasks: Use HIGH temperature (0.7-1.0)
# - Story writing
# - Brainstorming
# - Poetry
# - Marketing copy
request_body = {
"temperature": 0.9,
"messages": [{"role": "user", "content": "Write a creative tagline for a coffee shop"}]
}
# Output varies: "Where dreams brew daily" / "Sip, savor, smile" / "Your daily dose of magic"
# Balanced tasks: Use MEDIUM temperature (0.5-0.7)
# - Conversational AI
# - General assistance
# - Explanations
request_body = {
"temperature": 0.6,
"messages": [{"role": "user", "content": "Explain how photosynthesis works"}]
}
Pro tip: Start with 0.7 and adjust based on results. If outputs are too random, decrease. If too repetitive, increase.
Top P (Nucleus Sampling): Controlling Diversity
What it does: Top P limits the model to consider only the most probable tokens whose cumulative probability adds up to P.
How it works: Instead of considering all possible tokens, the model: 1. Sorts tokens by probability (highest to lowest) 2. Adds probabilities until reaching the P threshold 3. Only samples from this subset
Visual Example:
All tokens sorted by probability:
"the" → 40% ████████
"a" → 25% █████
"an" → 15% ███
"this" → 10% ██
"my" → 5% █
"your" → 3%
"our" → 2%
... (hundreds more)
With top_p = 0.8:
✓ "the" (40%) - included (cumulative: 40%)
✓ "a" (25%) - included (cumulative: 65%)
✓ "an" (15%) - included (cumulative: 80%)
✗ "this" (10%) - excluded (would exceed 80%)
✗ All others excluded
Model only chooses from: ["the", "a", "an"]
When to use:
# Focused, consistent output: Use LOW top_p (0.1-0.5)
request_body = {
"top_p": 0.3,
"temperature": 0.7,
"messages": [{"role": "user", "content": "List the steps to bake bread"}]
}
# Sticks to most likely, conventional responses
# Balanced output: Use MEDIUM top_p (0.7-0.9)
request_body = {
"top_p": 0.85,
"temperature": 0.7,
"messages": [{"role": "user", "content": "Describe a sunset"}]
}
# Good mix of common and interesting word choices
# Creative, diverse output: Use HIGH top_p (0.95-1.0)
request_body = {
"top_p": 0.98,
"temperature": 0.8,
"messages": [{"role": "user", "content": "Write a surreal poem"}]
}
# Considers wider vocabulary, more unexpected choices
Relationship with Temperature:
- Temperature adjusts the probability distribution
- Top P then selects which tokens to consider
- Use both together for fine control
- Common combination: temperature=0.7, top_p=0.9
Top K: Limiting Vocabulary
What it does: Top K limits the model to only consider the K most likely tokens at each step.
How it works: Simpler than Top P: 1. Sort all tokens by probability 2. Keep only the top K tokens 3. Sample from these K tokens
Visual Example:
With top_k = 3:
All tokens: Top 3 only:
"the" → 40% "the" → 40%
"a" → 25% "a" → 25%
"an" → 15% "an" → 15%
"this" → 10% ← cut off
"my" → 5% ← cut off
... (all others ignored)
When to use:
# Very focused: top_k = 1-10
request_body = {
"top_k": 5,
"messages": [{"role": "user", "content": "What is 2+2?"}]
}
# Extremely deterministic, only most likely words
# Balanced: top_k = 40-100
request_body = {
"top_k": 50,
"messages": [{"role": "user", "content": "Describe a forest"}]
}
# Good variety while avoiding very unlikely words
# Creative: top_k = 200-500
request_body = {
"top_k": 250,
"messages": [{"role": "user", "content": "Invent a new creature"}]
}
# Wider vocabulary, more creative freedom
Top K vs Top P: - Top K: Fixed number of tokens (e.g., always 50 tokens) - Top P: Dynamic number based on probability (could be 3 tokens or 100 tokens) - Top P is generally preferred because it adapts to the situation - Some models use both together
Max Tokens: Controlling Response Length
What it does: Sets the maximum number of tokens the model can generate in its response.
How it works: - Model stops generating when it reaches max_tokens - OR when it naturally completes (hits a stop sequence) - Whichever comes first
Important considerations:
# Too low: Response gets cut off mid-sentence
request_body = {
"max_tokens": 10,
"messages": [{"role": "user", "content": "Explain machine learning"}]
}
# Output: "Machine learning is a subset of artificial..." [TRUNCATED]
# Too high: Unnecessary cost and latency
request_body = {
"max_tokens": 4000,
"messages": [{"role": "user", "content": "What is 2+2?"}]
}
# Output: "4" (only uses ~1 token, but you reserved 4000)
# Just right: Based on expected response
request_body = {
"max_tokens": 500, # ~375 words
"messages": [{"role": "user", "content": "Summarize this article"}]
}
Setting max_tokens by use case:
# Short answers (50-100 tokens)
"What is the capital of France?"
max_tokens = 50
# Paragraphs (200-500 tokens)
"Explain how neural networks work"
max_tokens = 400
# Essays/Articles (1000-2000 tokens)
"Write a blog post about climate change"
max_tokens = 1500
# Long-form content (2000-4000 tokens)
"Write a detailed tutorial on Python decorators"
max_tokens = 3000
Cost implications:
# Example pricing (Claude 3 Sonnet):
# Input: $0.003 per 1K tokens
# Output: $0.015 per 1K tokens
# Short response (100 tokens)
cost = (100 * 0.015) / 1000 = $0.0015
# Long response (2000 tokens)
cost = (2000 * 0.015) / 1000 = $0.03
# Over a million requests:
# Short: $1,500
# Long: $30,000
# Setting appropriate max_tokens saves real money!
Stop Sequences: Controlling When to Stop
What it does: Tells the model to stop generating when it encounters specific strings.
How it works: - Model generates tokens normally - After each token, checks if output ends with a stop sequence - If match found, stops immediately (stop sequence not included in output)
Common use cases:
# Stop at paragraph breaks
request_body = {
"stop_sequences": ["\n\n"],
"messages": [{"role": "user", "content": "Write one paragraph about dogs"}]
}
# Ensures only one paragraph is returned
# Stop at specific markers
request_body = {
"stop_sequences": ["END", "---", "###"],
"messages": [{"role": "user", "content": "Generate a code snippet"}]
}
# Stop at conversation turns
request_body = {
"stop_sequences": ["Human:", "User:", "\nQ:"],
"messages": [{"role": "user", "content": "Continue this dialogue"}]
}
# Stop at list completion
request_body = {
"stop_sequences": ["\n\n", "Conclusion", "In summary"],
"messages": [{"role": "user", "content": "List 5 benefits of exercise"}]
}
Practical example:
# Without stop sequence:
prompt = "List 3 fruits:"
response = "1. Apple\n2. Banana\n3. Orange\n\nFruits are nutritious and..."
# Keeps going!
# With stop sequence:
request_body = {
"stop_sequences": ["\n\n"],
"messages": [{"role": "user", "content": "List 3 fruits:"}]
}
response = "1. Apple\n2. Banana\n3. Orange"
# Stops at double newline
Combining Parameters for Optimal Results
The parameter interaction matrix:
# Factual, deterministic responses
factual_config = {
"temperature": 0.1, # Very focused
"top_p": 0.5, # Limited vocabulary
"max_tokens": 300, # Concise
"stop_sequences": ["\n\n"]
}
# Creative, diverse responses
creative_config = {
"temperature": 0.9, # High randomness
"top_p": 0.95, # Wide vocabulary
"max_tokens": 2000, # Room for creativity
"stop_sequences": [] # Let it flow
}
# Balanced, conversational responses
balanced_config = {
"temperature": 0.7, # Moderate randomness
"top_p": 0.9, # Good variety
"max_tokens": 800, # Reasonable length
"stop_sequences": ["\n\nHuman:", "\n\nUser:"]
}
# Code generation
code_config = {
"temperature": 0.2, # Precise
"top_p": 0.8, # Focused on common patterns
"max_tokens": 1500, # Enough for functions
"stop_sequences": ["```\n\n", "# End"]
}
Error Handling
import botocore
try:
response = bedrock.invoke_model(
modelId=model_id,
body=json.dumps(request_body)
)
except botocore.exceptions.ClientError as error:
if error.response['Error']['Code'] == 'ValidationException':
print("Invalid request parameters")
elif error.response['Error']['Code'] == 'ResourceNotFoundException':
print("Model not found or not enabled")
elif error.response['Error']['Code'] == 'ThrottlingException':
print("Rate limit exceeded")
else:
print(f"Error: {error}")
Best Practices
General Best Practices
- Start with default parameters - Use recommended defaults before tuning
- Adjust temperature based on use case:
- Factual tasks: 0.1-0.3
- Creative writing: 0.7-0.9
- General purpose: 0.5-0.7
- Use stop sequences - Prevent unwanted continuation
- Monitor token usage - Control costs by setting appropriate max_tokens
- Handle streaming for long responses - Better user experience
- Implement retry logic - Handle throttling and transient errors
- Cache responses - Reduce API calls for repeated queries
Achieving Stable, Consistent, and Repeatable Responses
When you need deterministic outputs (same input → same output every time), such as for customer support, automated reporting, compliance, or testing, follow these parameter settings:
The Stability Formula
For maximum stability and repeatability, minimize all sources of randomness:
# Maximum stability configuration
stable_config = {
"temperature": 0.0, # No randomness - always pick most probable token
"top_p": 0.1, # Restrict to top 10% probability mass
"top_k": 1, # Only consider the single most probable token
"max_tokens": 1000
}
# Alternative: Slightly flexible but still very stable
balanced_stable_config = {
"temperature": 0.1, # Minimal randomness
"top_p": 0.2, # Small probability window
"top_k": 5, # Top 5 tokens only
"max_tokens": 1000
}
Parameter-by-Parameter Guide for Stability
1. Temperature: Set to 0.0 - 0.3
Why: Temperature controls randomness. Lower = more deterministic.
# Maximum determinism
{"temperature": 0.0} # Always picks most probable token (greedy decoding)
# Very stable with tiny variation
{"temperature": 0.1} # 99% deterministic, allows minimal variation
# Stable but slightly flexible
{"temperature": 0.3} # Good for factual responses with some natural variation
Effect:
- 0.0: Identical output every time (100% repeatable)
- 0.1: Nearly identical with minor word choice variations
- 0.3: Consistent meaning but may vary phrasing slightly
2. Top_p: Set to 0.1 - 0.3 or Disable
Why: Restricts token selection to high-probability options only.
# Very restrictive (most stable)
{"top_p": 0.1} # Only top 10% probability mass
# Balanced stability
{"top_p": 0.2} # Top 20% probability mass
# For maximum stability, combine with low temperature
{"temperature": 0.1, "top_p": 0.1}
Note: Some models may not support disabling top_p entirely. Use the lowest value that works.
3. Top_k: Set to 1 - 10
Why: Limits vocabulary to the most probable tokens.
# Maximum determinism (greedy decoding)
{"top_k": 1} # Always pick the single most probable token
# Very stable with minimal variation
{"top_k": 3} # Choose from top 3 tokens only
# Stable but allows some natural variation
{"top_k": 10} # Top 10 tokens - still quite deterministic
Effect:
- top_k = 1: Completely deterministic (same as temperature = 0)
- top_k = 3-5: Highly consistent with minor variations
- top_k = 10: Stable but more natural-sounding
4. Stop Sequences: Use Consistently
Why: Ensures output ends at the same point every time.
# Define clear stop points
{
"stop_sequences": ["\n\n", "END", "---"],
"temperature": 0.1
}
5. Avoid Penalties (If Available)
Why: Penalties introduce variability to avoid repetition.
# For stability, disable penalties
{
"frequency_penalty": 0, # Don't penalize repeated tokens
"presence_penalty": 0 # Don't encourage new topics
}
Note: Bedrock models may not expose these parameters directly, but be aware if using other platforms.
Complete Stability Configurations by Use Case
Use Case 1: Customer Support Bot (Maximum Consistency)
customer_support_config = {
"temperature": 0.1,
"top_p": 0.2,
"top_k": 5,
"max_tokens": 500,
"stop_sequences": ["\n\nCustomer:", "\n\nAgent:"]
}
# Example usage
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": "How do I reset my password?"}]
}
],
system=[
{"text": "You are a customer support agent. Provide clear, step-by-step instructions."}
],
inferenceConfig=customer_support_config
)
# Result: Same question will always get the same answer
Use Case 2: Automated Reporting (100% Repeatability)
reporting_config = {
"temperature": 0.0, # Zero randomness
"top_p": 0.1,
"top_k": 1, # Greedy decoding
"max_tokens": 2000
}
# Example: Generate monthly report
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": f"Generate monthly sales report for: {sales_data}"}]
}
],
system=[
{"text": """Generate a sales report with this exact structure:
1. Executive Summary
2. Key Metrics (bullet points)
3. Top Performers
4. Areas for Improvement
5. Recommendations
Use professional, formal language."""}
],
inferenceConfig=reporting_config
)
# Result: Identical data will produce identical reports
Use Case 3: Compliance/Auditing (Reproducible Outputs)
compliance_config = {
"temperature": 0.0,
"top_p": 0.1,
"top_k": 1,
"max_tokens": 1500,
"stop_sequences": ["END OF ANALYSIS"]
}
# Example: Compliance check
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": f"Analyze this transaction for compliance: {transaction}"}]
}
],
system=[
{"text": """You are a compliance analyzer. For each transaction, provide:
1. Compliance Status: PASS/FAIL/REVIEW
2. Regulations Checked: [list]
3. Findings: [detailed list]
4. Risk Level: LOW/MEDIUM/HIGH
5. Recommended Action: [specific action]
Be consistent and deterministic in your analysis."""}
],
inferenceConfig=compliance_config
)
# Result: Same transaction always gets same analysis
Use Case 4: API Response Generation (Consistent JSON)
api_response_config = {
"temperature": 0.0,
"top_p": 0.1,
"top_k": 1,
"max_tokens": 1000,
"stop_sequences": ["\n```"]
}
# Example: Generate API response
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": f"Convert to JSON: {user_data}"}]
}
],
system=[
{"text": """Convert input to valid JSON with this exact structure:
{
"status": "success",
"data": {...},
"timestamp": "ISO-8601"
}
Output only valid JSON, no explanations."""}
],
inferenceConfig=api_response_config
)
# Result: Same input always produces same JSON structure
Use Case 5: Testing and QA (Reproducible Test Cases)
testing_config = {
"temperature": 0.0,
"top_p": 0.1,
"top_k": 1,
"max_tokens": 800
}
# Example: Generate test cases
def generate_test_case(feature_description):
"""Generate consistent test cases for QA"""
response = bedrock.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[
{
"role": "user",
"content": [{"text": f"Generate test cases for: {feature_description}"}]
}
],
system=[
{"text": """Generate test cases in this format:
Test Case ID: TC-XXX
Description: [clear description]
Preconditions: [list]
Steps: [numbered steps]
Expected Result: [specific outcome]
Priority: HIGH/MEDIUM/LOW
Be consistent and thorough."""}
],
inferenceConfig=testing_config
)
return response
# Result: Same feature description always generates same test cases
Practical Implementation: Stability Helper Class
class StableBedrockClient:
"""
Bedrock client optimized for stable, repeatable responses
"""
STABILITY_PRESETS = {
"maximum": {
"temperature": 0.0,
"top_p": 0.1,
"top_k": 1,
"description": "100% deterministic - identical outputs"
},
"high": {
"temperature": 0.1,
"top_p": 0.2,
"top_k": 5,
"description": "Very stable with minimal variation"
},
"moderate": {
"temperature": 0.3,
"top_p": 0.3,
"top_k": 10,
"description": "Stable but allows natural phrasing"
}
}
def __init__(self, region_name='us-east-1'):
self.client = boto3.client('bedrock-runtime', region_name=region_name)
def invoke_stable(
self,
prompt: str,
model_id: str = "anthropic.claude-3-sonnet-20240229-v1:0",
stability_level: str = "high",
system_prompt: str = None,
max_tokens: int = 1000
):
"""
Invoke model with stability-optimized parameters
Args:
prompt: User prompt
model_id: Bedrock model ID
stability_level: "maximum", "high", or "moderate"
system_prompt: Optional system prompt
max_tokens: Maximum tokens to generate
Returns:
Stable, repeatable response
"""
# Get stability preset
config = self.STABILITY_PRESETS.get(stability_level, self.STABILITY_PRESETS["high"])
# Build inference config
inference_config = {
"temperature": config["temperature"],
"topP": config["top_p"],
"maxTokens": max_tokens
}
# Build request
request_params = {
"modelId": model_id,
"messages": [
{
"role": "user",
"content": [{"text": prompt}]
}
],
"inferenceConfig": inference_config
}
if system_prompt:
request_params["system"] = [{"text": system_prompt}]
# Invoke
response = self.client.converse(**request_params)
return {
"text": response['output']['message']['content'][0]['text'],
"usage": response['usage'],
"config_used": config
}
def test_stability(self, prompt: str, iterations: int = 5):
"""
Test stability by running same prompt multiple times
Returns:
Dictionary with results and consistency analysis
"""
results = []
for i in range(iterations):
response = self.invoke_stable(prompt, stability_level="maximum")
results.append(response['text'])
# Check if all responses are identical
all_identical = all(r == results[0] for r in results)
unique_responses = len(set(results))
return {
"all_identical": all_identical,
"unique_responses": unique_responses,
"total_iterations": iterations,
"consistency_rate": f"{((iterations - unique_responses + 1) / iterations) * 100:.1f}%",
"responses": results
}
# Usage Examples
client = StableBedrockClient()
# Example 1: Maximum stability
response = client.invoke_stable(
prompt="What is the capital of France?",
stability_level="maximum"
)
print(response['text'])
# Output: "The capital of France is Paris." (always identical)
# Example 2: Test stability
test_results = client.test_stability(
prompt="Explain what machine learning is in one sentence.",
iterations=10
)
print(f"Consistency: {test_results['consistency_rate']}")
print(f"Unique responses: {test_results['unique_responses']}/10")
# Example 3: Customer support with high stability
support_response = client.invoke_stable(
prompt="How do I reset my password?",
stability_level="high",
system_prompt="You are a helpful customer support agent. Provide clear, step-by-step instructions."
)
print(support_response['text'])
Trade-offs and Considerations
Pros of Stable Configuration
✅ Consistency: Same input always produces same output ✅ Predictability: Easier to test and validate ✅ Reliability: Users get consistent information ✅ Compliance: Reproducible for auditing ✅ Debugging: Easier to identify issues
Cons of Stable Configuration
❌ Less Natural: Responses may sound robotic or repetitive ❌ Reduced Creativity: Cannot generate diverse alternatives ❌ Ambiguity Issues: May struggle with open-ended questions ❌ Repetitive Phrasing: Same phrases used repeatedly ❌ Less Engaging: Conversations may feel mechanical
When to Use Stable vs. Creative Configurations
# Use STABLE configuration for:
stable_use_cases = [
"Customer support FAQs",
"Automated reporting",
"Compliance analysis",
"API response generation",
"Testing and QA",
"Data extraction",
"Classification tasks",
"Fact-based Q&A"
]
# Use CREATIVE configuration for:
creative_use_cases = [
"Content writing",
"Brainstorming",
"Story generation",
"Marketing copy",
"Creative problem solving",
"Conversational chat",
"Idea generation",
"Poetry or artistic content"
]
# Use BALANCED configuration for:
balanced_use_cases = [
"General assistance",
"Educational tutoring",
"Code explanation",
"Technical documentation",
"Email drafting",
"Meeting summaries"
]
Verification: Testing Stability
def verify_stability(bedrock_client, prompt, config, iterations=10):
"""
Verify that a configuration produces stable outputs
"""
responses = []
for i in range(iterations):
response = bedrock_client.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": [{"text": prompt}]}],
inferenceConfig=config
)
text = response['output']['message']['content'][0]['text']
responses.append(text)
# Calculate metrics
unique_responses = len(set(responses))
consistency_rate = ((iterations - unique_responses + 1) / iterations) * 100
print(f"Stability Test Results:")
print(f" Total runs: {iterations}")
print(f" Unique responses: {unique_responses}")
print(f" Consistency rate: {consistency_rate:.1f}%")
print(f" Configuration: {config}")
if unique_responses == 1:
print(" ✅ Perfect stability - all responses identical")
elif unique_responses <= 3:
print(" ✅ High stability - minimal variation")
else:
print(" ⚠️ Low stability - consider lowering temperature/top_p/top_k")
return {
"unique_responses": unique_responses,
"consistency_rate": consistency_rate,
"responses": responses
}
# Test different configurations
print("Testing Maximum Stability:")
verify_stability(
bedrock,
"What is 2+2?",
{"temperature": 0.0, "topP": 0.1, "maxTokens": 50},
iterations=10
)
print("\nTesting High Stability:")
verify_stability(
bedrock,
"What is 2+2?",
{"temperature": 0.1, "topP": 0.2, "maxTokens": 50},
iterations=10
)
Key Takeaway
For stable, consistent, and repeatable responses: - Temperature: 0.0 - 0.3 (lower = more stable) - Topp: 0.1 - 0.3 (lower = more stable) - Topk: 1 - 10 (lower = more stable) - Stop sequences: Use consistently - System prompts: Be specific and structured
Start with temperature=0.1, top_p=0.2, top_k=5 and adjust based on your stability requirements and output quality needs.
Cost Optimization
- Use smaller models when possible (e.g., Claude Haiku vs Opus)
- Set appropriate max_tokens limits
- Implement caching for common queries
- Use batch processing for multiple requests
- Monitor usage with CloudWatch
Additional Resources
Example: Complete Implementation
import boto3
import json
from typing import Dict, Any
class BedrockClient:
def __init__(self, region_name: str = 'us-east-1'):
self.client = boto3.client(
service_name='bedrock-runtime',
region_name=region_name
)
def invoke_claude(
self,
prompt: str,
max_tokens: int = 1024,
temperature: float = 0.7,
system_prompt: str = None
) -> str:
"""Invoke Claude model with specified parameters"""
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": temperature,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
if system_prompt:
request_body["system"] = system_prompt
try:
response = self.client.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps(request_body)
)
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
except Exception as e:
print(f"Error invoking model: {e}")
raise
# Usage
bedrock_client = BedrockClient()
result = bedrock_client.invoke_claude(
prompt="Explain quantum computing in simple terms",
temperature=0.5
)
print(result)
Conclusion
AWS Bedrock provides a powerful, unified interface to multiple foundation models. Understanding inference parameters allows you to fine-tune model behavior for your specific use case, balancing creativity, accuracy, and cost.